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Abstract 


Artificial  i^eural  networks  can  perform  reliable  classification  of  ground 
vehicles  based  solely  on  their  acoustic  signatures,  if  robust  features  can  be 
identified.  We  present  feature  extraction  and  classification  results  using 
simple  power  spectrum  estimates,  harmonic  line  association,  and  principal 
component  analysis.  Algorithm  implementation  and  performance  analysis 
of  each  feature  extraction  method  are  discussed.  Also  given  are  preliminary 
evaluation  results  of  a  VLSI  (very-large-scale  integration)  device  dedicated 
to  neural  network  implementation. 
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1.  Introduction 


It  should  be  possible  to  use  artificial  neural  networks  (ANNs)  to  classify 
tracked  and  wheeled  vehicles  solely  based  on  their  acoustic  signatures. 
The  main  problem  faced  in  classification  is  the  selection  of  proper  feature 
vectors  that  will  be  stable  and  class  specific.  Acoustic  signatures  are  typi¬ 
cally  nonstationary  [1,2]  and  are  often  corrupted  by  propagation  effects, 
noise,  and  interference  from  the  environment  [1,3,4].  A  robust  feature 
extraction  technique  must  be,  to  some  degree,  tolerant  of  these  issues  in 
order  to  be  reliable.  We  have  investigated  three  feature  extraction  tech¬ 
niques:  simple  power  spectrum  estimates  (PSEs),  harmonic  line  association 
(HLA)  techniques  [4],  and  principal  component  analysis  (PCA)  [5-7]. 
Algorithm  implementation  and  performance  analysis  for  these  techniques 
are  discussed  and  compared.  Also  given  are  preliminary  evaluation  results 
of  a  VLSI  (very-large-scale  integration)  chip  dedicated  to  neural  networks. 

1.1  Feature  Extraction 

Fundamentally,  feature  extraction  and  selection  involve  choosing  those 
features  of  a  class  of  patterns  (whether  waveforms,  images,  or  geometric 
shapes)  that  will  maintain  class  separability  under  the  constraint  of  some 
criterion  function.  Feature  extraction  is  a  mapping  of  the  original  n-dimen- 
sional  measurements  into  an  m-dimensional  feature  space  (n  >  m).  In 
theory,  the  Bayes  error  [5,7,8]  is  the  optimum  measure  of  a  feature's  effec¬ 
tiveness,  but  it  is  difficult  to  obtain.  C3ne  would  need  to  perform  nonpara- 
metric  density  estimation  [5,7,8],  a  very  time-consuming  task,  to  obtain  the 
posterior  probabilities  and  in  turn  the  Bayes  error.  Often  in  practice,  fea¬ 
ture  extraction  for  representation  is  different  from  that  for  classification: 
features  used  for  representation  can  be  suboptimal  for  classification  since 
they  are  not  based  on  class  separability  [5].  The  criterion  used  frequently 
for  systematic  feature  extraction  (Fukunaga's  separability  criterion)  is 
based  on  a  family  of  scatter  matrices  that  can  measure  the  class  separabil¬ 
ity  and  generate  optimal  transformation  matrices.  This  criterion  can  be 
applied  by  ANNs  with  the  correct  training  algorithm  [6,9-12].  The  main 
problem  is  in  its  implementation  under  current  system  constraints;  there¬ 
fore,  we  use  simpler  analytical  tools  to  evaluate  the  feature  space. 

1.2  Feature  Extraction  Techniques 

The  spectral  characteristics  of  vehicle  noise  are  distinctive:  their  acoustic 
signatures  are  dominated  by  narrow-band  spectral  peaks,  since  the  physi¬ 
cal  process  producing  these  sounds  (engine  firing  rate  and  track  slap)  is 
periodic  [3].  Spectral  methods  are  amenable  to  calculation  because  of  their 
simplicity  and  the  existence  of  fast  algorithms.  These  spectral  lines,  the 
first  feature  space  considered,  should  present  a  good  feature  vector  for 
classification  and  have  in  fact  been  used  in  the  past  for  classification  based 
on  simple  clustering  techniques  [4],  for  hierarchical  clustering,  and  as 
inputs  to  an  ANN  [13].  The  spectral  peaks  are  typically  bandlimited  be¬ 
tween  0  and  400  Hz,  but  peak  components  occur  between  10  and  120  Hz. 
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A  second  feature  space  based  on  HLA  [4]  allows  one  to  reduce  the  feature 
space  considerably  and  should  not  appreciably  reduce  the  separability  of 
the  various  classes  of  vehicles  [5]. 

A  third  feature  space  that  holds  promise  is  the  principal  components.  PCA 
is  a  well-established  technique  in  feature  selection  for  both  representation 
and  classification  [5,6,9].  PCA  has  a  high  degree  of  energy  compaction:  it 
basically  transforms  the  original  space  into  an  uncorrelated  space,  thus 
reducing  the  dimension  of  the  feature  space.  PCA  is  the  brother  of  the 
Karhunen-Loeve  transform,  which  is  known  to  be  the  optimal  transform 
method  for  signal  representation  [14,15].  Principal  components  are  derived 
by  the  following  set  of  relationships.  Let 

u(«)  =  [M(f),u(f  +  l) . u{t  +  N-lf  (1) 

be  an  N  X  1  random  input  vector  and  assume  zero  mean  without  loss  of 
generality.  Let  R  be  the  NxN correlation  matrix  of  the  data  with  eigenval¬ 
ues  Aj,  ^2,  ...,  A^.  The  k  principal  components  are  defined  by  the  linear 
transformation 

C(«)  =  0^u(m)  ,  (2) 

where  C(n)  =  [c^,  C2, . . .,  Cj^]^  is  a  fc  x  1  principal  component  matrix,  and  O  is 
an  N  X  k  matrix,  with  columns  corresponding  to  k  eigenvectors  for  the  k 
largest  eigenvalues  of  R.  We  choose  the  value  k  arbitrarily,  where  k  is  sig¬ 
nificantly  less  than  N,  thus  reducing  the  dimensionality  of  the  original 
input  space.  The  correlation  matrix  of  the  newly  formed  principal  compo¬ 
nents  is  a  diagonal  matrix;  thus  the  principal  components  are  uncorrelated 
[6,19].  The  principal  components  are  optimal  in  a  mean  square  sense  and 
have  removed  redundancy  associated  with  the  original  measurement.  A 
motivation  for  using  principal  components  is  that  data  that  exhibit  a  high 
degree  of  correlation  from  sample  to  sample  may  allow  fast  algorithms  to 
implement  PCA.  Also,  several  researchers  in  neural  networks  [6,9- 
11,15,16]  have  derived  learning  algorithms  to  implement  PCA.  Finally, 
work  in  perfect  reconstruction  filter  banks  [17]  leads  one  to  believe  that  it 
may  be  possible  to  employ  PCA  in  "real  time." 

Artificial  Neural  Networks 

ANNs  are  currently  in  use  by  ARL  for  classification  of  ground  and  air  tar¬ 
gets.  In  target  classification,  the  ANN  can  not  only  aid  m  providing  infor¬ 
mation  about  the  target  class,  but  also  give  a  measure  of  one's  confidence 
in  the  decision.  ANNs  derive  their  computational  power  from  their  paral¬ 
lel  distributed  structure  and  ability  to  learn.  Because  neurons  are  basically 
nonlinear  devices,  the  ANN  will  be  nonlinear;  nonlinearity  is  a  very 
important  property  in  light  of  the  input  signal  structure,  which  is 
nonstationary  and  perhaps  nonlinear. 

The  backpropagation  ANN  derives  its  name  from  the  error-correction  rule 
used  in  its  training.  Basically,  the  error  backpropagation  consists  of  two 
passes  through  the  network.  The  forward  pass  takes  the  input  vector  and 
computes  an  activity  pattern  that  propagates  through  the  network,  layer 


by  layer.  During  the  forward  pass,  all  the  synaptic  weights  remain  fixed.  In 
the  backward  pass,  the  synaptic  weights  are  adjusted  by  an  error  correc¬ 
tion  rule  that  is  fundamentally  the  same  for  both  hidden  neurons  and  out¬ 
put  neurons  and  is  based  on  stochastic  gradient  descent  [9,19].  Each  neu¬ 
ron  has  the  general  structure  shown  in  figure  1.  The  overall  architecture  for 
the  network  is  shown  in  figure  2. 

From  figure  1,  the  governing  expression  for  the  output  of  a  single  neuron  is 
the  summation  of  weighted  inputs: 

P 

v-{n)=  X  wAn)x{n)  (3) 

>  !=0  > 

where 

y.  (n)  =  (Pj(v-(n))  (4) 

is  the  output  of  the  jth  neuron. 


Figure  1.  General 
neuron  structure. 
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(5) 


Here  <Pj{n)  is  the  sigmoidal  activation  function  given  by 


an  important  approximation  to  hardlimiting.  The  sigmoidal  activation 
function  is  differentiable,  which  facilitates  the  weight  update  given  by  the 
delta  rule  general  expression 

Awy/nj  =  T]8j(n)  yfn)  ;  (6) 

here  r]  is  the  learning  rate  parameter,  5j{n)  is  the  local  gradient  (which  de¬ 
pends  on  whether  neuron  j  is  in  the  output  layer  or  the  hidden  layer),  and 
yfn)  is  the  input  to  the  ;th  neuron.  The  local  gradient  points  to  the  required 
changes  in  the  synaptic  weights;  in  the  output  layer,  the  local  gradient  has 
the  form 


6^in)  =  ej(n)  (pj(Vj(n))  ,  (7) 

with  the  error  signal  ej{n)  given  by 

efn)  =  dfn)-yj{n)  (8) 

and  dj{n)  is  the  desired  signal. 

The  picture  is  more  complex  for  the  weight  updates  in  the  hidden  layers; 
here  the  local  gradient  is  dependent  on  all  the  errors  associated  with  the 
neurons  in  the  output  layer  when  only  one  layer  is  hidden. 

The  local  gradient  for  a  hidden  layer  is  given  by 

=  ¥j(Vj{n))  S  (9) 

with  Sj^{n)  derived  from  the  error  signals  associated  with  the  k  output  neu¬ 
rons  connected  to  the  /th  hidden  neuron.  These  equations  represent  the 
general  backpropagation  algorithm  and  do  not  include  the  refinements 
available  for  a  more  robust  network. 


2.  Procedure 

2.1  Data  Collection 

RNADS  (Remote  Netted  Acoustic  Detection  System)  [1],  a  remote  sensor 
architecture,  was  used  to  gather  acoustic  data  from  ground  vehicles  at 
Grayling,  Michigan,  and  Aberdeen  Proving  Ground,  Maryland.  The  ve¬ 
hicles  included  three  tracked  (class  0, 1,  and  3)  and  one  wheeled  vehicle 
(class  2),  all  powered  by  12-cylinder  diesel  engines  (see  table  1).  The  re¬ 
mote  sensor  consists  of  an  8-ft-diameter  circular  array  of  Knowles  BL1994 
ceramic  microphones,  with  six  microphones  placed  along  the  perimeter 
and  a  seventh  microphone  at  the  center  of  the  array.  This  array  baseline 
provides  good  directivity  at  low  frequencies.  Figure  3  shows  the  RNADS 
sensor  and  processing  architecture. 
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Table  1.  Classes  of 
vehicles. 


Class 

Vehicle 

0 

12-cylinder  diesel,  tracked  vehicle 

1 

12-cylinder  diesel,  tracked  vehicle 

2 

12-cylinder  diesel  engine,  heavy 

wheeled  vehicle 

3 

12-cylinder  diesel,  tracked  vehicle 

The  acoustic  signals  were  preamplified  with  a  selectable  gain  of  40  and 
60  dB  and  passed  to  a  ruggedized  personal  computer  (PC)  and  a  digital 
audio  tape  (DAT)  recorder.  The  DAT  recorder  sampled  the  acoustic  signa¬ 
tures  at  a  2-kHz  rate,  well  above  the  Nyquist  rate.  Within  the  PC,  acoustic 
signals  are  anti-aliased  with  a  lowpass  filter,  fed  to  16-bit  analog-to-digital 
converters,  and  further  processed  with  a  pair  of  commercially  available 
digital  signal  processing  boards  for  real-time  applications. 

2.2  Feature  Extraction  Methods 

We  generated  PSEs  for  each  1-s  interval  of  data  using  Hanning  windowed 
short-time  Fourier  transforms  according  to  the  Welch  method  [18].  We 
used  the  first  200  frequency  bins  derived  from  the  power  spectrum  in  the 
1  to  200  Hz  range  for  classification  in  the  ANN. 

A  second  technique  used  was  selecting  only  those  peaks  that  were 
"harmonically  related."  An  HLA  was  developed  by  Robertson  and  Weber 
[4]  to  create  harmonic  line  sets  for  each  second  of  data  samples.  This  algo¬ 
rithm  takes  the  strongest  peak  P  in  the  frequency  peak  set  subject  to  the 
constraint^^^  e  [8,20]  Hz,  assumes  that  this  peak  is  some  kth  harmonic 
line  of  the  fundamental  frequency,  and  then  calculates  the  total  signal 
strength  in  that  HLA  set.  The  integer  value  k  that  gives  the  maximum 
signal  strength  is  assumed  to  be  the  correct  harmonic  line  number,  and  a 
total  of  11  harmonic  lines  are  retained  as  the  feature  vector.  This  technique 
has  the  advantage  of  normalizing  the  feature  vector,  since  the  feature  is 
based  on  harmonic  line  number  and  not  a  function  of  frequency. 

To  calculate  the  principal  components,  we  downsampled  the  data  to 
512  Hz  and  then  divided  them  into  512/N  subblocks  of  N  samples  (N  =  64 
or  128)  for  each  data  snapshot.  The  data  were  then  used  to  generate  a  set  of 
instantaneous  autocorrelation  matrix  estimates  (see  eq  (10)  to  (12))  [19].  Al¬ 
though  subblock  sizes  of  64  and  128  samples  were  used  in  generating  the 
correlation  matrix  estimates,  we  report  only  the  results  with  64-sample 


Figure  3.  Field  sensor 
and  processing 
architecture. 


•  Digital  signal  processor 
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subblock  sizes.  The  128-sample  subblocks  gave  similar  classification  re¬ 
sults  in  preliminary  training  and  testing  of  the  neural  network.  The  corre¬ 
lation  matrix  estimates  were  used  for  eigenanalysis;  11  eigenvectors  associ¬ 
ated  with  the  largest  11  eigenvalues  were  used  to  transform  the  original 
data  vector  (64  samples)  from  each  subblock  to  produce  the  principal  com¬ 
ponents  for  that  subblock.  The  principal  components  generated  for  each 
subblock  were  then  averaged  to  produce  an  averaged  principal  component 
feature  vector  over  the  512-sample  block  (see  eq  (13)).  The  entire  procedure 
was  repeated,  shifting  256  samples  (50-percent  overlap)  and  forming  a 
new  512-sample  block  for  processing.  We  wanted  to  compare  the  perfor¬ 
mance  of  the  PC  A  and  HLA  feature  space  in  the  classification  scheme,  so 
only  11  PCA  features  were  retained;  it  was  also  necessary  to  generate  one 
PCA  feature  vector  per  second  of  data  sampled.  The  estimation  process  for 
a  correlation  matrix  is  based  on  the  following  data  matrix  formulation: 

A«  =  [u(M),  u(M  +  1), . . .,  u(N)]  ,  (10) 

with  the  matrix  u(f)  given  by 

u(0  =  [w(0,  m(z  - 1), ...,«(/-  M  +  I)]!"  ,  (11) 

with  the  indices  i  falling  in  the  range  [M,  N]. 

Therefore,  the  data  matrix  A^  is  an  M  by  N  -  M  +  1  rectangular  Toeplitz 
matrix. 


Then  the  estimation  of  matrix  R  is  performed  by 


R  = 


1 


2(N  -M) 


A^A, 


(12) 


and  here  the  estimate  R  will  be  an  M  by  M  matrix.  The  values  used  for  ac¬ 
tual  processing  of  the  PCA  feature  vectors  were  M  =  32, 64  and  N  =  64, 128. 
The  estimate  of  the  principal  components  for  the  512-sample  block  were 
generated  from 


with  /  =  512 /N  and  the  terms  within  the  summation  derived  from  equation 

(2). 


Backpropagation  Neural  Network 

The  backpropagation  neural  network  (BPNN)  was  trained  by  repeated 
presentation  of  examples  of  a  particular  input/output  class  with  a  subse¬ 
quent  adjustment  to  the  synaptic  weights  based  on  the  difference  between 
the  desired  and  the  actual  output.  This  process  is  repeated  until  the  user 
set  exit  criterion  is  met  for  termination  of  the  training  procedure.  Three  dif¬ 
ferent  statistics  can  be  used  as  exit  criteria  to  terminate  the  training  of  the 
BPNN.  The  first  is  the  number  of  epochs,  which  is  a  constant  number  of 
iterations  assigned  for  training  before  training  begins.  The  second  is  based 
on  the  mean  square  error  (MSB),  which  is  a  general  measure  of  the  perfor¬ 
mance  of  a  given  neural  network  model  for  a  given  data  set.  The  third  exit 


criterion  in  the  BPNN  is  based  on  the  R-squared  statistic,  which  is  the  pro¬ 
portion  variability  in  the  target  data  set  based  on  the  input  variables. 

We  used  the  epoch  training  as  our  exit  criterion  with  a  value  of  1000  itera¬ 
tions.  The  learning  rate  parameter  was  set  to  0.0005  and  was  automatically 
adjusted  downward  by  an  annealing  divisor  of  1.1.  This  adaptation  of  the 
learning  rate  allows  fine-grain  adjustments  during  the  training.  The  maxi¬ 
mum  initial  weights  of  the  network  were  set  to  0.01,  and  a  random  number 
generator  was  used  to  initialize  these  weights  so  that  the  network  will 
avoid  starting  near  a  local  minimum  or  an  undesirable  initial  weight  posi¬ 
tion.  Further  refinements  to  the  learning  rate  were  accomplished  through 
an  interlayer  multiplier,  which  only  affected  the  learning  rate  of  the  hidden 
neuron,  lire  interlayer  multiplier  will  cause  the  hidden  nodes  to  be  more 
sensitive  to  learning  and  thus  improve  the  speed  of  learning. 

Finally,  smoothing  was  incorporated  in  the  rate  of  learning  for  the  BPNN. 
Smoothing  can  be  highly  beneficial  to  the  learning  behavior  of  the  neural 
network  [9];  it  allows  control  of  the  weight  adjustment  based  on  the  past 
values  of  gradient  descent  and  can  prevent  the  training  process  from  ter¬ 
minating  in  a  shallow  local  minimum.  The  greater  the  smoothing  factor, 
the  greater  the  influence  of  past  adjustments  and  the  smoother  the  migra¬ 
tion  of  weights.  A  smoothing  constant  value  of  0.9  was  used  in  our  neural 
network,  which  means  that  90  percent  of  the  weight  adjustment  is  gov¬ 
erned  by  the  average  of  the  past  directions  of  gradient  descent,  and  10- 
percent  by  the  current  direction  of  gradient  descent.  This  is  the  default 
smoothing  constant  in  the  Database  Mining  Workstation  [13],  a  commer¬ 
cially  available  software  package  for  unearthing  and  evaluating  data  char¬ 
acteristics  using  BPNNs.  The  data  sets  were  divided  into  training  and  test¬ 
ing  blocks  for  this  purpose.  Training  sample  sets  were  composed  of  75,  67, 
and  50  percent  of  the  data  set. 

The  BPNN  classifier  was  used  to  calculate  the  percentage  of  correct  identi¬ 
fication  of  ground  vehicles,  and  in  some  cases  the  confidence  levels  were 
also  generated.  A  confusion  matrix  was  calculated  that  provides  the  per¬ 
centage  of  correct  identification  (Cjp)  for  each  class  of  ground  vehicles 
based  on 


Np 

Qd  =  ^-  (14) 

Here  is  the  total  number  of  correct  predicted  values  and  is  the  total 
number  of  observations.  Confidence  levels  for  the  classification  of  each  tar¬ 
get  were  calculated  by 


_  1  ^  ^  ^cid-Pfid- 
2(L-1)  ^ 


(15) 


where  L  is  the  total  number  of  output  classes,  Pqd,  1®  ^1^®  predicted  value 
of  correct  identification  for  class  i,  and  Ppjp).  is  the  predicted  value  of  false 
identification  for  class  /  with  respect  to  class  i. 
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Thus  Pqid  output  for  the  neural  network  output  node  dedicated  to  a 

particular  class,  and  ^FIDj  is  the  output  for  the  other  output  nodes. 

3.  Results 

3.1  Confusion  Matrices 

Table  2  shows  the  confusion  matrices  for  testing  the  PSE,  HLA,  and  PCA 
features  on  the  trained  BPNN  for  each  feature  space.  The  numbers  repre¬ 
sent  percentage  of  correct  identification.  The  BPNN  used  had  one  input 
layer  of  11  or  200  input  nodes,  one  hidden  layer  of  15  nodes,  and  one  out¬ 
put  layer  of  4  nodes.  Table  1  (sect.  2)  shows  the  general  class  characteristics 
of  classes  0, 1, 2,  and  3. 

The  scores  in  table  2  are  representative  of  several  trials  for  each  BPNN  and 
type  of  feature  vector  investigated;  the  values  are  rounded  to  the  nearest 
integer.  The  rows  do  not  sum  to  100  percent  because  of  roundoff  error. 


Table  2.  Testing  results  for  trained  BPNN. 

Feature  _ Confusion  matrices  according  to  percentage  trained /tested _ 

space  75/25  67/33  50/50 

PSE  Actual  Net  output  Actual  Net  output  Actual  Net  output 

0  i  2  3  0  1  2  3  0  1  2  3 

0  93  0  3  3  0  90  0  4  4  0  93  0  0  4 

1  0  96  2  1  1  0  98  0  1  1  0  93  2  2 

2  3  0  96  02  4  0  95  02  8  0  89  2 

3  14  2  0  82  3  15  3  2  79  3  13  4  0  82 

Confidence  level  =  81%  Confidence  level  =  80% _  Confidence  level  =  78% _ 

HLA  Actual  Net  output  Actual  Net  output _  Actual  _ Net  output 

0  i  2  3  0  i  2  3  0  1  2  3 

0  88  1  0  9  0  89  1  0  8  0  92  114 

1  0  93  601  0  94  601  0  89  10  0 

2  0  2  97  02  0  2  97  02  0  3  96  0 

3  20  0  12  67  3  20  1  11  66  3  27  3  10  58 

Confidence  level  =  75%  Confidence  level  =  74% _  Confidence  level  =  71% _ 

PCA  Actual  Net  output  Actual  Net  output  Actual  _ Net  output 

0  i  2  3  0  i  2  3  0  1  2  3 

0  98  0000  95  0220  92  133 

1  0  99  001  0  99  101  0  99  00 

2  6  0  90  2  2  6  1  84  7  2  7  0  86  6 

3  4  0  0  94  3  5  0  0  94  3  5  0  1  93 

Confidence  level  =  88%  Confidence  level  =  83% _ Confidence  level  =  81% _ 
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3.2  Results  using  CNAPS 

Further  testing  was  performed  with  a  commercial  ANN  software  package 
known  as  "BrainMaker,"  developed  by  California  Scientific  Software  [20], 
which  was  run  on  a  general-purpose  digital  machine  called  CNAPS  (Con¬ 
nected  Network  of  Adaptive  Processors)  [21-23].  CNAPS,  manufactured 
by  Adaptive  Solutions,  is  based  on  VLSI  technology  and  is  capable  of  high 
neural  network  performance.  CNAPS  is  an  SIMD  (single  instruction 
stream,  multiple  data  stream)  machine  consisting  of  an  array  of  128  digital 
signal  processors  operating  in  parallel,  significantly  accelerating  both  the 
training  and  testing  of  ANNs.  An  8-chip  CNAPS  system  running  at  a  mere 
25  MHz,  for  example,  can  perform  12.8  billion  multiply  accumulates  per 
second  [9].  The  best  efficiencies  are  obtained  with  very  large  nets  where  up 
to  128  nodes  in  the  same  layer  may  be  processed  simultaneously,  but 
smaller  nets  can  gain  some  benefit  as  well. 

The  BrainMaker  package  running  on  the  CNAPS  hardware  can  train  and 
run  ANNs  with  exactly  one  input  layer,  one  hidden  layer,  and  one  output 
layer  using  the  standard  backpropagation  training  algorithm.  The  user  can 
set  a  training  tolerance  so  that  only  those  training  examples  with  a  root 
mean  square  (RMS)  error  above  the  tolerance  will  cause  the  weights  to  be 
updated.  During  testing,  a  user-supplied  tolerance  is  used  to  determine 
the  correctness  of  the  net's  answers.  Test  examples  are  scored  as  correct  if 
the  RMS  error  is  below  the  tolerance,  and  they  are  scored  as  incorrect  if  the 
RMS  error  exceeds  the  tolerance.  This  is  a  somewhat  conservative  crite¬ 
rion,  in  that  an  example  might  have  the  highest  activation  in  the  correct 
output  node,  but  still  count  as  a  misclassification  if  the  RMS  error  were 
high. 

For  testing,  a  trial  and  error  approach  was  used  to  find  a  good  net  configu¬ 
ration  for  the  PC  A  data  and  for  the  HLA  data.  Training  times  were  fixed  at 
2000  epochs  for  these  tests.  We  determined  the  performance  for  each  data 
set  by  averaging  the  results  of  25  tests. 

We  used  a  random  process  to  select  10  percent  of  the  data  set  for  testing 
and  90  percent  for  training.  Five  different  training/ testing  set  divisions 
were  made,  and  five  different  training/ testing  cycles  were  performed  for 
each  division,  for  a  total  of  25  different  tests.  The  percentage  of  correct  clas¬ 
sifications  on  the  test  set  for  each  test  was  averaged  to  provide  a  single 
score.  The  average  correct  classification  on  the  test  sets  was  90.8  percent  for 
the  HLA  features  and  96.8  percent  for  the  PCA  features.  Training  times 
were  considerably  less  for  the  CNAPS  card  over  software  implementation: 
for  example,  for  a  neural  network  of  11  inputs,  11  hidden  layers,  and  4  out¬ 
put  nodes,  using  epoch  training  of  4000  iterations  and  516  feature  vectors, 
the  software  trained  in  13  minutes,  whereas  the  CNAPS  would  perform 
the  same  training  in  90  s. 
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4.  Conclusions 


The  PCA  features  show  a  marked  improvement  over  the  HLA  feature  set 
for  correct  classification  and  confidence  levels  for  all  classes  and  a  slight 
improvement  over  the  PSE  feature  space.  The  most  notable  improvement 
was  in  the  identification  of  class  3,  where  PCA  feature  extraction  gave  a  far 
more  robust  and  stable  feature  vector  for  this  target  class.  Both  of  the  other 
feature  sets  have  difficulty  discriminating  this  target  class.  The  largest  de¬ 
gree  of  misclassification  for  all  feature  vectors  occurs  between  class  0  and 
3,  where  as  much  as  27-percent  misclassification  occurs  with  the  HLA  fea¬ 
tures  (see  table  3).  This  result  was  also  observed  when  hierarchical  A:-means 
clustering  analysis  was  used  to  derive  data  clusters  for  the  four  classes  (un¬ 
published  findings);  again  a  great  deal  of  crossover  occurs  for  these  two 
classes,  with  a  lesser  degree  between  class  2  and  class  3.  It  is  also  interest¬ 
ing  that  even  though  PCA  performs  so  well  on  the  average,  its  perfor¬ 
mance  in  the  classification  of  class  2  (a  12-cylinder  diesel  engine  truck)  is 
unexpectedly  poor:  this  vehicle  is  very  loud,  with  a  characteristic  signa¬ 
ture,  and  has  been  classified  to  100  percent  using  maximum  likelihood 
methods  in  the  past  [3].  We  would  expect  to  see  similar  results  with  the 
PCA  features,  but  this  is  not  the  case;  perhaps  the  distinction  is  degraded 
by  the  fact  that  classes  0,  2,  and  3  are  closely  related,  and  in  this  instance, 
the  data  were  collected  from  the  same  environment.  Further  testing  of 
trained  neural  networks  with  appropriate  class  data  collected  from  other 
test  sites  should  allow  us  to  resolve  this  issue.  We  should  be  careful  in  con¬ 
sidering  the  results  for  class  1  since  it  had  a  small  representation  in  the 
training  and  testing:  only  one  data  file  (albeit  quite  large)  was  used  for  this 
vehicle  class.  Class  1  data  were  also  the  only  representative  data  collected 
at  Aberdeen  Proving  Ground. 

It  is  not  surprising  that  the  PSE  feature  space  produced  such  a  high  degree 
of  correct  classification.  The  PSE  results  indicate  that  the  narrowband  fea¬ 
tures  for  each  class  are  indeed  highly  class  specific;  a  feature  method  that 
maintains  some  of  the  "brute  force"  frequency  and  amplitude  resolution 
characteristic  of  PSE  with  lower  dimensionality  may  be  ideal.  Despite  the 
PSE's  simplicity  and  performance,  we  expect  that  the  classification  results 
will  drop  for  targets  evaluated  under  different  background  environments, 
since  the  algorithm  inherently  has  a  high  degree  of  sensitivity  to  environ¬ 
mental  variables.  Apparently,  HLA  features  are  lacking  in  some  necessary 
narrowband  components  for  a  higher  degree  of  correct  classification. 

The  choice  of  using  a  simple  backpropagation  neural  network  classifier  is 
supported  not  only  by  the  results  but  also  by  theory:  the  backpropagation 
algorithm  is  not  only  simple  in  implementation  but  will  closely  approxi¬ 
mate  the  Bayes  error  with  increased  training  [9].  Preliminary  results  with 
the  CNAPS  card  also  support  the  notion  that  it  will  provide  advantages  to 
a  fielded  neural  network  classifier.  When  retraining  is  necessary,  the 
CNAPS  implementation  will  significantly  enhance  overall  system  robust¬ 
ness  by  its  processing  speed  alone. 
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5.  Future  Considerations 


For  future  work,  the  three  feature  spaces  have  to  be  evaluated  for  their 
complexity  and  real-time  implementation.  The  downside  to  the  PCA  is 
that  it  is  difficult  to  implement  without  resorting  to  the  incorporation  of  a 
preprocessing  feedforward  neural  network  [6,9-12,16]  or  to  periodic 
eigenanalysis  of  the  acoustic  data.  The  preprocessing  neural  network 
could  be  initialized  in  the  field  with  scenario-based  "eigenclusters"  deter¬ 
mined  by  the  use  of  clustering  techniques  and  derived  for  a  set  of  classes 
that  one  would  expect  to  encounter.  These  eigenclusters,  which  would  be 
class  specific  (i.e.,  tracked  versus  wheeled),  would  be  a  one-dimensional 
application  of  Sirovich  and  Kirby's  "eigenpicture"  method  for  classifica¬ 
tion  [24].  Any  subsequent  retraining  would  be  performed  only  when 
misclassification  grows  beyond  some  threshold.  The  instantaneous  esti¬ 
mate  of  the  correlation  matrices  by  the  simple  matrix  technique  in  this  re¬ 
port  is  simply  too  time  consuming  when  large  data  blocks  are  concerned, 
and  thus  direct  procedures  to  calculate  the  principal  components  for  each 
time  block  probably  carmot  be  employed.  Alternatively,  we  could  average 
the  autocorrelation  matrix  estimates  over  the  entire  512-sample  block,  cal¬ 
culate  the  eigenvectors,  and  then  transform  each  subblock  to  generate 
principal  component  estimates.  Although  this  approach  gives  a  consider¬ 
able  savings  computationally  and  is  less  sensitive  to  signal  to  noise  issues, 
it  requires  that  we  assume  stationary  signals  over  the  sample  block  of  in¬ 
terest.  For  nonstationary  signals,  the  errors  associated  with  this  procedure 
may  make  the  formation  of  principal  components  irrelevant.  This  tech¬ 
nique  will  be  investigated  further  because  it  may  prove  promising  in 
"PCA-like"  feature  extraction. 

Although  the  simple  PSE  feature  space  is  readily  derived  and  can  be  used 
rapidly  for  identification,  preliminary  results  have  shown  that  it  is  sensi¬ 
tive  to  the  environment,  and  misclassification  can  grow  substantially.  Also, 
it  is  very  time-consuming  in  the  training  stage;  without  the  implementa¬ 
tion  of  ■^SI,  it  is  cumbersome  for  real-time  applications.  We  will  look  into 
employing  CNAPS  using  this  simple  feature  vector  in  the  future. 

The  HLA  feature  vector  also  has  limitations  in  real-time  implementation. 
Several  steps  are  required  to  perform  the  harmonic  matching,  which  must 
be  tailored  to  meet  real-time  criteria.  The  classification  results  for  the  HLA 
feature  space  are  acceptable;  this  feature  space  has  the  added  advantage 
that  it  can  generate  several  feature  vectors  for  the  same  input  sequence.  It 
can  therefore  readily  adapt  to  a  multitarget  case,  where  one  would  like  to 
derive  feature  vectors  for  each  target  present  and  thus  perform  multitarget 
classification.  A  generalized  HLA  algorithm  should  be  investigated  for 
performing  the  multitarget  feature  extraction.  We  will  use  adaptive 
beamforming  to  take  care  of  the  multitarget  issue,  but  HLA  may  produce 
multiple  feature  vectors,  even  when  an  adaptive  beamforming  technique 
fail),  because  the  targets  are  too  near  each  other.  Adaptive  beamforming  is 
limited  in  detecting  two  closely  spaced  targets,  whereas  HLA  would  not 
have  this  limit,  since  it  will  operate  on  the  received  power  spectrum  and 
select  multiple  feature  vector  examples. 
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Further  analysis  will  also  be  performed  on  the  optimum  number  of  fea¬ 
tures  per  feature  vector  for  PCA  and  HLA  feature  extraction  techniques.  In 
the  work  reported  here,  the  selection  for  feature  vector  dimensionality  was 
"ad  hoc"  at  best  (primarily  driven  by  the  existing  HLA  algorithm  feature 
space);  both  classification  results  and  class  separability  analysis  may  show 
that  the  dimensionality  can  be  reduced  further.  Preliminary  results  with 
the  HLA  features  suggest  this  to  be  true  for  classification  of  the  four-target 
case.  The  number  of  classes  included  in  the  analysis  should  also  be  ex¬ 
tended.  More  importantly,  the  four-class  problem  should  be  further  inves¬ 
tigated  with  data  sets  recorded  in  several  different  environments.  Toler¬ 
ance  with  respect  to  the  environment  is  of  paramount  importance  in  the 
evaluation  of  a  feature  extraction  algorithm  for  correct  classification.  We 
have  found  that  the  PSE  method  is  sensitive  to  environmental  conditions 
in  preliminary  studies. 

Finally,  we  will  investigate  features  based  on  wavelet  filters,  which  have 
been  successfully  applied  in  speech  recognition  and  waveform  classifica¬ 
tion  [25,26]. 
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